This file contains the Supplementary Materials for Parvatham, Tann, Pergola, & Gambi (2024). Conversational Interest between teachers and second language learners evaluated by human and LLMs.

Linguistic predictors of human interest ratings.

Concreteness

Figure S1 - Correlations between the two concreteness measures (_megahr and _mrc) and unrounded average human interestingness ratings. Correlation coefficients and significance are based on Kendall’s tau non-parametric test.

Figure S2 - Correlations between the two concreteness measures (_megahr and _mrc) and unrounded average human expected interestingness ratings. Correlation coefficients and significance are based on Kendall’s tau non-parametric test.

Comprehensibility

This table summarizes the outcome of feature-level model comparisons between linear regression models containing only a linear effect or both a linear and a quadratic effect (linear and quadratic effects were always orthogonal to each other). Separate comparisons were conducted for each feature and for models with either average interestingness (Int) or average expected interestingness (Exp Int) as outcome variables. Model comparison p values < .05 indicate the model including both a linear and quadratic predictor is better according to a likelihood ratio test (function anova() in R).
Table S1 - Feature-level model comparisons for comprehensibility metrics
Feature Int (p value) Exp Int (p value) Int (winning model) Exp Int (winning model
gis 0.0000 0.0027 quadratic quadratic
syllable_count 0.0000 0.0000 quadratic quadratic
lexicon_count 0.0000 0.0000 quadratic quadratic
difficult_words 0.0000 0.0000 quadratic quadratic
flesch_reading_ease 0.0000 0.0000 quadratic quadratic
flesch_kincaid_grade 0.0002 0.0014 quadratic quadratic
smog_index 0.2275 0.3825 linear linear
coleman_liau_index 0.0000 0.0000 quadratic quadratic
automated_readability_index 0.5579 0.5556 linear linear
dale_chall_readability_score 0.0000 0.0000 quadratic quadratic
spache_readability 0.1531 0.3249 linear linear
gunning_fog 0.0000 0.0000 quadratic quadratic
linsear_write_formula 0.0000 0.0000 quadratic quadratic
mcalpine_eflaw 0.0000 0.0000 quadratic quadratic
text_standard 0.0000 0.0000 quadratic quadratic

Uptake

Figure S3 shows effects of teacher_uptake_student (A) and student_uptake_teacher (B) on average interestingness as a function of whether the first speaker displayed on a page was the teacher or the student.
Figure S3A - Interestingness as a function of teacher uptake student

Figure S3A - Interestingness as a function of teacher uptake student

Figure S3B - Interestingness as a function of student uptake teacher

Figure S3B - Interestingness as a function of student uptake teacher

Figure S4 shows effects of teacher_uptake_student (A) and student_uptake_teacher (B) on average expected interestingness as a function of whether the first speaker displayed on a page was the teacher or the student.
Figure S4A - Expected Interestingness as a function of teacher uptake student

Figure S4A - Expected Interestingness as a function of teacher uptake student

Figure S4B - Expected Interestingness as a function of student uptake teacher

Figure S4B - Expected Interestingness as a function of student uptake teacher

Combined models

Tables S2 (Interestingness) and S3 (Expected Interestingness) show the random effect estimates for the combined models reported in Tables 8 and 9 in the main manuscript, respectively.
Table S2 - Int ~ conc + cli + si + ari + sri + gis_lc + gis_qc + lex_lc + lex_qc + LCS_proc_d_numc + suthlc + cos_within_page_c + (1 | conversation_id) + (1 | AnnId)
grp var1 var2 vcov sdcor
AnnId (Intercept) NA 0.350 0.592
conversation_id (Intercept) NA 0.056 0.237
Residual NA NA 0.743 0.862
Table S3 - Int ~ conc + cli + si + ari + sri + gis_lc + gis_qc + lex_lc + lex_qc + LCS_proc_d_numc + suthlc + cos_within_page_c + (1 | conversation_id) + (1 | AnnId)
grp var1 var2 vcov sdcor
AnnId (Intercept) NA 0.355 0.596
conversation_id (Intercept) NA 0.042 0.204
Residual NA NA 0.729 0.854

Separate models for each feature category, including maximal random slopes that could be estimated.

These are the full outputs (fixed effects, followed by random effects) for models combining selected features from each of the three categories (concreteness, comprehensibility, uptake) separately; these models include the maximal random slopes that could be estimated without convergence warnings. The model formula is reported for the Interstingness model; in all cases we used the same model formula for the Expected interestingness model

Table S4: model formula for concreteness
Feature Formula
Concreteness Int~conc + (1 | conversation_id) + ((1 | AnnId) + (0 + conc | AnnId))
Table S5: Concreteness predicting Interestingness - fixed effects
Feature Beta SE t
(Intercept) 2.108 0.069 30.496
conc -0.164 0.013 -12.987
Table S6: Concreteness predicting Interestingness - random effects
grp var1 var2 vcov sdcor
AnnId conc NA 0.010 0.101
AnnId.1 (Intercept) NA 0.364 0.603
conversation_id (Intercept) NA 0.058 0.241
Residual NA NA 0.793 0.890
Table S7: Concreteness predicting Expected interestingness - fixed effects
Feature Beta SE t
(Intercept) 1.998 0.067 29.635
conc -0.115 0.013 -8.912
Table S8: Concreteness predicting Expeccted interestingness - random effects
grp var1 var2 vcov sdcor
AnnId conc NA 0.011 0.105
AnnId.1 (Intercept) NA 0.364 0.603
conversation_id (Intercept) NA 0.044 0.209
Residual NA NA 0.765 0.875
Table S9: model formula for comprehensibility
Feature Formula
Comprehensibility Int~cli + si + gis_lc + gis_qc + lex_lc + lex_qc + (1 | conversation_id) + ((1 | AnnId) + (0 + gis_lc | AnnId) + (0 + lex_lc | AnnId) + (0 + lex_qc | AnnId))
Table S10: Comprehensibility predicting Interestingness - fixed effects
Feature Beta SE t
(Intercept) 2.088 0.067 31.288
cli 0.037 0.006 5.817
si 0.033 0.008 3.902
gis_lc 0.017 0.010 1.757
gis_qc -0.031 0.006 -4.811
lex_lc 0.207 0.019 10.714
lex_qc -0.166 0.018 -9.468
Table S11: Comprehensibility predicting Interestingness - random effects
grp var1 var2 vcov sdcor
AnnId lex_qc NA 0.018 0.133
AnnId.1 lex_lc NA 0.027 0.163
AnnId.2 gis_lc NA 0.003 0.056
AnnId.3 (Intercept) NA 0.345 0.588
conversation_id (Intercept) NA 0.050 0.223
Residual NA NA 0.720 0.849
Table S12: Comprehensibility predicting Expected interestingness - fixed effects
Feature Beta SE t
(Intercept) 1.978 0.067 29.748
cli 0.028 0.006 4.514
si 0.032 0.008 3.763
gis_lc 0.006 0.010 0.612
gis_qc -0.015 0.006 -2.299
lex_lc 0.167 0.018 9.083
lex_qc -0.125 0.017 -7.543
Table S13: Comprehensibility predicting Expeccted interestingness - random effects
grp var1 var2 vcov sdcor
AnnId lex_qc NA 0.015 0.123
AnnId.1 lex_lc NA 0.023 0.153
AnnId.2 gis_lc NA 0.003 0.056
AnnId.3 (Intercept) NA 0.359 0.599
conversation_id (Intercept) NA 0.039 0.197
Residual NA NA 0.712 0.844
Table S14: modele formula for uptake
Feature Formula
Uptake Int~LCS_proc_d_numc + suthlc + cos_within_page_c + (1 | conversation_id) + ((1 | AnnId) + (0 + LCS_proc_d_numc | AnnId) + (0 + suthlc | AnnId) + (0 + cos_within_page_c | AnnId))
Table S15: Uptake predicting Interestingness - fixed effects
Feature Beta SE t
(Intercept) 2.135 0.070 30.330
LCS_proc_d_numc 0.115 0.010 11.211
suthlc 0.040 0.009 4.560
cos_within_page_c -0.062 0.009 -6.745
Table S16: Uptake predicting Interestingness - random effects
grp var1 var2 vcov sdcor
AnnId cos_within_page_c NA 0.002 0.047
AnnId.1 suthlc NA 0.002 0.044
AnnId.2 LCS_proc_d_numc NA 0.004 0.065
AnnId.3 (Intercept) NA 0.371 0.609
conversation_id (Intercept) NA 0.065 0.254
Residual NA NA 0.785 0.886
Table S17: Uptake predicting Expected interestingness - fixed effects
Feature Beta SE t
(Intercept) 2.019 0.068 29.611
LCS_proc_d_numc 0.081 0.010 7.702
suthlc 0.033 0.009 3.654
cos_within_page_c -0.046 0.010 -4.618
Table S18: Uptake predicting Expeccted interestingness - random effects
grp var1 var2 vcov sdcor
AnnId cos_within_page_c NA 0.004 0.063
AnnId.1 suthlc NA 0.003 0.051
AnnId.2 LCS_proc_d_numc NA 0.005 0.070
AnnId.3 (Intercept) NA 0.369 0.608
conversation_id (Intercept) NA 0.046 0.215
Residual NA NA 0.752 0.867

Linguistic predictors of variance in interest ratings

Figures S5-S14 show the relaion between linguistic features and variance in Interestingness ratings.
Figure S5 - Concreteness (_megahr)

Figure S5 - Concreteness (_megahr)

Figure S6 - Lexicon count

Figure S6 - Lexicon count

Figure S7 - GIS score

Figure S7 - GIS score

Figure S8 - Coleman Liau Index

Figure S8 - Coleman Liau Index

Figure S9 - Smog Index

Figure S9 - Smog Index

Figure S10 - Automated Readability Index

Figure S10 - Automated Readability Index

Figure S11 - Spache Readability

Figure S11 - Spache Readability

Figure S12 - Longest Common Subsequence (LCS, processed version)

Figure S12 - Longest Common Subsequence (LCS, processed version)

Figure S13 - Student Uptake Teacher

Figure S13 - Student Uptake Teacher

Figure S14 - Embeddings-based cosine similarity (raw version)

Figure S14 - Embeddings-based cosine similarity (raw version)

Tables S19 (Interestingness) and S20 (Expected Interestingness) report models predicting variance in human ratings.
Table S19 - Int_var ~ conc + cli + si + ari + sri + gis_lc + gis_qc + lex_lc + lex_qc + LCS_proc_d_numc + suthlc + cos_within_page_c + (1 | conversation_id) + (1 | project)
Feature Beta SE t
(Intercept) 1.001 0.050 19.934
conc -0.016 0.015 -1.091
cli -0.013 0.019 -0.686
si -0.029 0.017 -1.742
ari 0.048 0.033 1.463
sri -0.020 0.027 -0.725
gis_lc 0.002 0.015 0.125
gis_qc -0.005 0.012 -0.457
lex_lc -0.011 0.018 -0.596
lex_qc 0.029 0.013 2.207
LCS_proc_d_numc -0.028 0.013 -2.088
suthlc -0.014 0.012 -1.109
cos_within_page_c -0.008 0.013 -0.609
Table S20 - ExpInt_var ~ conc + cli + si + ari + sri + gis_lc + gis_qc + lex_lc + lex_qc + LCS_proc_d_numc + suthlc + cos_within_page_c + (1 | conversation_id) + (1 | project)
Feature Beta SE t
(Intercept) 1.037 0.056 18.462
conc -0.025 0.015 -1.688
cli -0.025 0.019 -1.322
si -0.009 0.017 -0.504
ari 0.046 0.033 1.381
sri -0.044 0.028 -1.581
gis_lc -0.006 0.015 -0.427
gis_qc -0.012 0.012 -1.006
lex_lc -0.012 0.018 -0.694
lex_qc 0.018 0.013 1.358
LCS_proc_d_numc -0.008 0.013 -0.606
suthlc -0.006 0.012 -0.509
cos_within_page_c -0.018 0.013 -1.355

Proficiency

Tables S21 (Interestingness) and S22 (Expected Interestingness) report models predicting human ratings from features and annotator/student proficiency.
Table S21 - Int ~ level_match_numc + student_level_nc + annotator_level_nc + conc + cli + si + ari + sri + gis_lc + gis_qc + lex_lc + lex_qc + LCS_proc_d_numc + suthlc + cos_within_page_c + (1 | AnnId)
Feature Beta SE t
(Intercept) 2.130 0.070 30.526
level_match_numc 0.176 0.028 6.363
student_level_nc 0.046 0.011 4.364
annotator_level_nc 0.096 0.091 1.059
conc -0.022 0.010 -2.075
cli 0.050 0.013 3.754
si 0.028 0.012 2.387
ari -0.038 0.023 -1.637
sri 0.039 0.019 2.019
gis_lc 0.017 0.010 1.699
gis_qc -0.032 0.008 -3.795
lex_lc 0.177 0.012 14.357
lex_qc -0.104 0.009 -11.507
LCS_proc_d_numc 0.037 0.009 3.915
suthlc 0.024 0.009 2.851
cos_within_page_c -0.015 0.009 -1.612
Table S22 - ExpInt ~ level_match_numc + student_level_nc + annotator_level_nc + conc + cli + si + ari + sri + gis_lc + gis_qc + lex_lc + lex_qc + LCS_proc_d_numc + suthlc + cos_within_page_c + (1 | AnnId)
Feature Beta SE t
(Intercept) 1.996 0.072 27.860
level_match_numc 0.194 0.027 7.143
student_level_nc 0.055 0.010 5.228
annotator_level_nc 0.069 0.093 0.740
conc -0.006 0.010 -0.622
cli 0.037 0.013 2.778
si 0.022 0.011 1.956
ari -0.021 0.023 -0.930
sri 0.013 0.019 0.695
gis_lc 0.006 0.010 0.645
gis_qc -0.013 0.008 -1.583
lex_lc 0.159 0.012 13.112
lex_qc -0.076 0.009 -8.542
LCS_proc_d_numc 0.019 0.009 2.021
suthlc 0.017 0.008 2.014
cos_within_page_c -0.014 0.009 -1.535

Reward Prediction Error

Figure S15: Relation between rpe and cosine similarity between pages

Figure S15: Relation between rpe and cosine similarity between pages

Table S23: Linear effect of cosine similarity predictins rpe
Feature Beta SE t
(Intercept) 0.118 0.026 4.511
cos_proc_pages_c 0.014 0.009 1.573
Table S24: Linear and quadratic effect of cosine similarity predictins rpe
Feature Beta SE t
(Intercept) 0.118 0.026 4.511
cpp_lc 0.014 0.009 1.572
cpp_qc -0.002 0.009 -0.266

Distributions of human and model ratings

Figure S16: Distributions of human and model ratings
Figure S16: Distributions of human and model ratings

Correlations between model error and human variance

Figure S17: Kendall Tau correlations between model error and variance in human intrest ratings

Figure S17: Kendall Tau correlations between model error and variance in human intrest ratings